Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

ثبت نشده
چکیده

This paper presented the single kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. SKMD is an abstraction layer located between applications and the OpenCL library. It uses OpenCL as the intermediate language. SKMD transparently partitions an OpenCL kernel across multiple devices being aware of the transfer cost and performance variation on the workload, launches parallel kernels, and merges the partial results into the final output automatically. Their system not only eliminates the tedious process of reengineering applications when the hardware changes, but also makes efficient partitioning decisions based on application characteristics, input sizes, and the underlying hardware. The dynamic compiler of SKMD has three main components:Kernel Transformer, Buffer Manager and Partitioner. The kernel transformer changes the original kernel to Partition-Ready kernel, which enables the kernel to work only on a subset of work-groups. After kernel transformation, the buffer manager performs static analysis on kernels to determine memory access pattern of each work-group. If memory access range of each work-group can be analyzed statically, the buffer manager will transfer only necessary data back; if memory access range cannot be analyzed, entire input should be transferred to each device and output must be merged. In order to merge irregular locations of output from different devices, the kernel transformer generates Merge Kernel, and SKMD launches it on CPU device.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Portability in Accelerated Parallel Kernels

Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for...

متن کامل

PSkel: A stencil programming framework for CPU-GPU systems

The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like CUDA are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern...

متن کامل

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms

Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetso...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014